Skip to content

[Codegen][CPU] Fold reshape-containing encoding relayouts to map_store.#24533

Merged
bjacob merged 1 commit into
mainfrom
users/bjacob/cpu-encoding-combine-layout
Jun 5, 2026
Merged

[Codegen][CPU] Fold reshape-containing encoding relayouts to map_store.#24533
bjacob merged 1 commit into
mainfrom
users/bjacob/cpu-encoding-combine-layout

Conversation

@bjacob

@bjacob bjacob commented May 25, 2026

Copy link
Copy Markdown
Collaborator

GPU's configuration pipeline folds the pack/expand_shape/transpose relayout chain from encoding materialization into a single iree_linalg_ext.map_store before tiling; CPU did not. For a non-row- major encoding swizzle the intervening tensor.expand_shape (not a TilingInterface op) blocks producer fusion and strands an untiled, whole-tensor pack intermediate.

Mirror GPU on CPU:

  • Add RelayoutCombinationScope::DispatchReshape: like Dispatch but restricted to chains whose backward slice contains an expand/collapse_shape. Pure pack/unpack/transpose chains tile fine and are left alone.
  • Run CombineResultLayoutTransformation with that scope in the CPU configuration pipeline, after MaterializeDeviceEncoding.
  • Skip map_load/map_store as getRootOperation candidates, so a dispatch with a real compute op plus a map_store roots on the compute op.

Inert for existing (row-major) CPU encodings.

Progress towards #24515.

@bjacob bjacob marked this pull request as ready for review June 1, 2026 15:27
@bjacob bjacob force-pushed the users/bjacob/cpu-encoding-combine-layout branch from c531378 to 3a9e9b4 Compare June 3, 2026 18:46
@bjacob bjacob changed the base branch from users/bjacob/cpu-hoist-alloc-cap to main June 3, 2026 18:48

@egebeysel egebeysel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM in general, just 2 questions 😄

Comment thread compiler/src/iree/compiler/Codegen/Utils/CPUUtils.cpp
Comment thread compiler/src/iree/compiler/Codegen/LLVMCPU/Passes.cpp
GPU's configuration pipeline folds the `pack`/`expand_shape`/`transpose`
relayout chain from encoding materialization into a single
`iree_linalg_ext.map_store` before tiling; CPU did not. For a non-row-
major encoding swizzle the intervening `tensor.expand_shape` (not a
`TilingInterface` op) blocks producer fusion and strands an untiled,
whole-tensor `pack` intermediate.

Mirror GPU on CPU:
- Add `RelayoutCombinationScope::DispatchReshape`: like `Dispatch` but
  restricted to chains whose backward slice contains an
  expand/collapse_shape. Pure pack/unpack/transpose chains tile fine and
  are left alone.
- Run `CombineResultLayoutTransformation` with that scope in the CPU
  configuration pipeline, after MaterializeDeviceEncoding.
- Skip `map_load`/`map_store` as `getRootOperation` candidates, so a
  dispatch with a real compute op plus a `map_store` roots on the
  compute op.

Inert for existing (row-major) CPU encodings.

Progress towards #24515.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
@bjacob bjacob force-pushed the users/bjacob/cpu-encoding-combine-layout branch from 3a9e9b4 to 5a6b922 Compare June 5, 2026 15:08
@bjacob bjacob merged commit d4ee61f into main Jun 5, 2026
67 checks passed
@bjacob bjacob deleted the users/bjacob/cpu-encoding-combine-layout branch June 5, 2026 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants